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BACKGROU ND OF THK T WENT! ON 

Field of the Tn ventton 

The present invention relates generally to microprocessors, 
and more specifically to a RISC microprocessor having plural, 
symmetrical sets of registers. 



20 Descript ion of tAe Background 

In addition to the usual complement of main memory storage 
and secondary pemxianent storage, a microprocessor-based computer 
system typically also includes one or more general puxrpose data 



25 flags. Previous systems have included integer registers for 
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hold±ng integer data and floating point registers for holding 
floating point data. Typically^ the status flags ace us^d for 
indicating certain conditions resulting from, the mdst recently 
executed operation. There generally are status flags for 
indicating whether, in the previous operation: a cariry occurred, 
a negative nximber resulted, and/or a zero resulted. 

These flags prove useful in determining the outcome of 
conditional branching within the flow of program control. For 
example, if it is desired to compare a first number to a second 
number and upon the conditions that the two are eqaial, to branch 
to a given subroutine, the microprocessor may compare the two 
numbers by subtracting one from the other, and setting or 
clearing the appropriate condition flags. The numerical value 
of the result of the subtraction need not be stored. A 
conditional branch instruction may then be executed, conditioned 
upon the status of the zero flag. VThile being simple to 
implement, this scheme lacks flexibility and power. Once the 
comparison has been performed, no further numerical or- other 
operations may be performed before the conditional branch upon 
the appropriate flag; otherwise, the intervening instructions 
will overwrite the condition flag values resulting from the 
comparison, likely causing erroneous branching. The scheme is 
further complicated by the fact that it may be desirable to form 
greatly complex tests for branching, rather than the simple 
ec[uality example given above. 

For example, assume that the program should branch to the 
subroutine only upon the condition that a first number is greater 
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than a second number, and a third number is less than a fourth 
number, and a fifth number is equal to a sixth nxmber-. It would 
be necessary for previous microprocessors to perform a lengthy 
series of comparisons heavily interspersed with conditional 
branches, A particularly . undesirable feature of this serial 
scheme of comparing and branching is observed in any 
microprocessor having an instruction pipeline. 

In a pipelined microprocessor, more than one instruction is 
being executed at any given time, with the plural instructions 
being in different stages of execution at any given moment- This 
provides for vastly improved throughput. A typical pipeline 
microprocessor may include pipeline stages for: (a) fetching an 
instruction, (b) decoding the instruction, (c) obtaining the 
instruction's operands, (d) executing the instruction, and 
(e) storing the results. The problem arises when a conditional 
branch instmiction is fetched. It may be the case that the 
conditional branch's condition cannot yet be tested, as the 
operands may not yet be calculated, if they are to result from 
operations which are yet in the pipeline. This results in a 
"pipeline stall", which dramatically slows down the processor. 

Another shortcoming of previous microprocessor-based systems 
is that they have included only a single set of registers of any 

« 

given data type. In previous architectures, when an increased 
number of registers has been desired within a given data type, 
•the solution has been simply to increase the size of the single 
set of those type of registers. This may result in addressing 
problems, access conflict problems, and symmetiry problems. 
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On a similar note, previous architectures have restricted 
each given register set to one respective numerical ^,data type. 
Various prior systems have allowed general purpose Registers to 
hold either numerical data or address "data"/ but the present 
application will not use the term "data" to include addresses* 
What is intended may be best understood with reference to two 
prior systems. The Intel 8085 microprocessor includes a register 
pair "HIi* which can be used to hold either two bytes of numerical 
data or one two-byte address. The present application's 
improvement is not directed to that issue. More on point , the 
Intel 80486 microprocessor includes a set of general purpose 
integer data registers and a set of floating point registers, 
with each set being limited to its respective data type, at least 
for purposes of direct register usage by arithmetic and logic 
units . 

This proves wasteful of the microprocessor's resources, such 
as the available silicon area, when the microprocessor is 
performing operations which do not involve both data types . For 
example, user applications. frec[uently involve exclusively integer 
operations, and perform no floating point operations whatsoever, 
mien such a user application is run on a previous microprocessor 
which includes floating point registers (such as the 80486), 
those floating point registers remain idle during the entire 
execution. 

Another problem with previous microprocessor register set 
architecture is observed in context switching or state switching 
between a user application and a higher access privilege level 
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entity such as the operating system kernel. When control within 
the microprocessor switches context, mode, or state, the 
operating system kernel or other entity to which* control is 
passed typically does not operate on the same data which the user 
application has been operating on. Thus, the data registers 
typically hold data values which are not useful to the new 
control entity but which must be maintained until the user 
application is resumed. The kernel must generally have registers 
for its own use, but typically has no way of knowing which 
registers are presently in use by the user application. In order 
to make space for its own data, the kernel must swap out or 
otherwise store the contents of a predetermined subset of the 
registers. This results in considerable loss of processing time 
to overhead, especially if the kernel makes repeated, short- 
duration assertions of control. 

On a related note, in prior microprocessors^ when it is 
required that a "grand scale" context switch be made, it has 
been necessary for the microprocessor to expend even greater 
cunounts of processing resources, including a generally large 
number of processing cycles, to save all data . and state 
information before making the switch. When context is switched 
back, the same perf oznoaance penalty has previously been paid, to 
restore the system to its former state. For example, if a 
microprocessor is executing two user applications, each of which 
rec[uires the, full complement of registers of each data type, and 
each of which may be in various stages of condition code setting 
operations or numerical calculations, each switch from one user 
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application to the other necessarily Involves swapping or 
otherwise saving the contents of every data register and state 

* 

flag in t:lxe system. This obviously involviss a great deal of 
operational overhead, resulting in significant performance 
degradation, particularly if the main or the secondary storage 
to which the registers must be saved is significantly slower than 
tbe microprocessor itself. 

Therefore, we have discovered that it is desirable to have 
an improved microprocessor architecture which allows the various 
component conditions of a complex condition to be calculated 
without any intervening conditional branches • We have further 
discovered that it is desirable that the plural simple conditions 
be calculable in parallel, to improve throughput of the 
microprocessor. 

We have also discovered that it is desirable to have an 
architecture which allows multiple register sets within a given 
data type. 

Additionally, we have discovered it to be desirable for a 
microprocessor's floating point registers to be usable as integer 
registers, in case the available integer registers are inadequate 
to optimally to hold the necessary amount of integer data. 
Notably, we have discovered that it is desirable that such 
re-typing be completely transparent to the user application. 

We have discovered it to be highly desirable to have a 
microprocessor which provides- a dedicated subset of registers 
which are reserved for use by the kernel in lieu of at least a 
subset of the user registers, and that this new set of registers 
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shoiild. be addressable in exactly the same manner as the register 

* 

subset which they replace, in order that the kernel may use the 
same register addressing scheme as user applications. We have 
further observed that it is desirable that the switch between the 
two sxibsets of registers recjuire no microprocessor overhead 
cycles, in order to maximally utilize the microprocessor's 
resources. 

Also, we have discovered it to be desirable to have a 
microprocessor architecture which allows for a "grand scale- 
context switch to be performed with minimal overhead. In this 
vein, we have discovered that is desirable to have an 
architecture which allows for plural banks of register sets of 
each type, such that two or more user applications may be 
operating in a multi-tasking environment, or other "simultaneous" 
mode, with each user application having sole access to at least 
a full bank of registers. It is our discovery that the register 
addressing scheme should, desirably, not differ between user 
applications, nor between register banks, to maximize simplicity 
of the user applications, and that the system should provide 
hardware support for switching between the register banks so that 
the user applications need not be aware of which register bank 
which they are presently using or even of the existence of other 
register banks or of other user applications . 

These and other advantages of our invention will be 
appreciated with reference to the following description of our 
invention, the accompanying drawings, and the claims. 
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SUMMARY OF THE INVENTION 

The present invention provides a register file - system 
comprising: an integer register set including first and second 
subsets of integer registers, and a shadow subset; a re-typable 
5 set of registers which are individually usable as integer 
registers or as floating point registers; and a set of 
individually addressable Boolean registers. 

The present invention includes integer and floating point 
Q functional units which execute integer instructions accessing the 

3.0 integer register set, and which operate in a plurality of modes. 
m In any mode, instructions are granted ordinary access to the 

first subset of integer registers. In a first mode, instructions 
"^"^ are also granted ordinary access to the second subset. However, 

in a second mode, instructions attempting to access the second 
^15 subset are instead granted access to the shadow subset, in a 
S manner which is transparent to the instructions. Thus, routines 

may be written without regard to which mode they will operate in, 
and system routines (which operate in the second mode) can have 
at least the second subset seemingly at their disposal , without 
20 having to expend the otherwise-required overhead of saving the 
second subset's contents (which may be in use by user processes 

operating in the first mode) , 

The invention further includes a plurality of integer 
register sets, which are individually addressable as specified 
25 by fields in instructions. The register sets include read ports 
and write ports which are accessed by multiplexers, wherein the 
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multiplexers are controlled by contents of the register 
set-specifying fields in the instructions . 

One of the integer register sets is also iisa±)le as a 
floating point register set. In one embodiment/ this set is 
sixty-four bits wide to hold double-precision floating point 
data, but only the low order thirty- two bits are used by integer 
instructions. 

The invention includes functional units for performing 
Boolean operations , and further includes a Boolean register set 
for holding results of the Boolean operations such that no 
dedicated, fixed-location status flags are recjuired. The integer 
and floating point functional units execute numerical comparison 
instructions , which specify individual ones of the Boolean 
registers to hold results of the comparisons . A Boolean 
functional unit executes Boolean combinational instructions whose 
sources and destination are specified registers in the Boolean 
register set. Thus, the present invention may perform 
conditional branches upon a single result of a complex Boolean 
function without intervening conditional branch instructions 
between the fundamental parts of the complex Boolean function, 
minimizing pipeline disruption in the data processor. 

Finally, there are multiple, identical register banks in the 

« 

system, each bank including the above-described register sets, 
A bank may be allocated to a given process or routine, such that 
the instructions within the routine need not specify upon which 
bank they operate, 

* 
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PRTEF DESCRIPTION OF ^HR DRAWINGS 

Fig, 1 is a block diagram of the instruction execution unit 
of the microprocessor of the present invention, showing the 
elements of the register file, 
5 Figs. 2%4 are simplified schematic and block diagrams of the 

floating point, integer and Boolean portions of the instruction 
execution unit of Fig. 1, respectively. 

Figs. 5-6 are more detailed views of the floating point and 
O integer portions, respectively, showing the means for selecting 

between register sets. 
m Fig. 7 illustrates the fields of an exemplary microprocessor 

f5 instruction word executable by the instruction execution unit of 

^ Fig. 1. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
I. REGISTER FILE 

Fig. 1 illustrates the basic components of the instruction 
execution unit (lEU) 10 of the RISC (reduced instruction set 

20 computing) processor of the present invention. The lEU 10 
includes a register file 12 and an execution engine 14. The 
register file 12 includes one or more register banks 16-0 to 
16-n. It will be understood that the structure of each register 
bank 16 is identical to all of the other register banks 16. 

25 Therefore, the present application will describe only register 
bank 16-0. The register bank includes a register set A 18, a 
register set FB 20, and a register set C 22. 
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In general, the invention may be cliaracteirized as a RISC 
microprocessor having a register file optimally configured for 
use in the execution of RISC instructions/ as opposed to 
conventional register files which are sufficient for use in the 
5 execution of CISC (complex instruction set computing) 
instructions by CISC processors. By having a specially adapted 
register file, the execution engine of the microprocessor's lEU 
achieves greatly improved performance, both in terms of resource 
utilization and in terms of raw throughput. The general concept 
10 is to tune a register set to a RISC instruction, while the 

ST 

3 specific implementation may involve any of the register sets in 

J the architecture. 



3 A. Register Set A 

e15 Register set A 18 includes integer registers 24 (RA[31:0l), 

I each of which is adapted to hold an integer value datum. In one 

^ embodiment, each integer may be thirty- two bits wide. The RA[] 

^ integer registers 24 include a first plurality 26 of integer 

registers (RA(23:01) and a second plurality 28 of integer 
20 registers (RA(31:24l). The RAt] integer registers 24 are each 
of identical structure, and are each addressable in the same 
manner, albeit with a unique address within the integer register 
set 24. For example, a first integer register 30 (E^(Ol) is 
addressable at a zero offset within the integer register set 24. 
25 RA[01 always contains the value zero. It has been observed 

that user applications and other programs use the constant value 
zero more than any other constant value. It is, therefore, 
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deslrable to have a zero readily available at all times, for 
clearing, comparing, and otber purposes* Another advantage of 
having a constant, hard-wired value in a, given register, 
regardless of the particular value, is that the given register 
5 may be used as the destination of any instmiction whose results 

need not be saved. 

Also, this means that the fixed register will never be the 

cause of a data dependency delay. A data dependency exists when 
O a -slave" instruction requires, for one or more of its operands, 

to the result of a "master" instruction. In a pipelined processor, 
m this may cause pipeline stalls. For example, the master 
O instruction, although occurring earlier in the code sequence 

r" than the slave instruction, may take considerably longer to 

^ execute. It will be -readily appreciated that if a slave 

Jjs -increment and store" instruction operates on the result data of 
£ a master "quadruple -word integer divide" instruction, the slave 

^ - « A- • * 

instruction will be fetched, decoded, and awaiting execution many 
clock cycles before the master instruction has fi^iished 
execution. However, in certain instances, the numerical result 
20 of a master instiniction is not needed, and the master instruction 
is executed for some other purpose only, such as to set condition 
code flags. If the master instruction's destination is RA[Ol, 
the numerical results will be effectively discarded. The data 
dependency checker (not shown) of the lEU 10 will not cause the 
25 slave instruction to be delayed, as the ultimate result of the 
master instruction — zero — is already known. 
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The integer register set A 24 also includes a set of shadow 
registers 32 (RT[31:24l). Each shadow register can hold an 
integer value, and is, in one embodiment ^ also thirty- two bits 
wide* Each shadow register is addressable as an offset in the 

* 

5 same manner in which each integer register is addressable. 

"Finally, the register set A includes an lEU mode integer 
switch 34. The switch 34, like other such elements, need not 
have a physical embodiment as a switch, so long as the 
O corresponding logical functionality is provided within the 

^0 register sets. The lEO mode integer switch 34 is coupled to the 
^ first subset 26 of integer registers on line 36, to the second 

subset of integer registers 28 on line 38, and to the shadow 
^ registers 32 on line 40. All accesses to the register set A 18 

^ are made through the ZEU mode integer switch 34 on line 42. Any 

access request to read or write a register in the first subset 
RA[23:0] is passed automatically through the lEU mode integer 
switch 34. However, accesses to an integer register with an 
offset outside the first subset RA[23:0l will be directed either 
to the second subset RAt31:24] or the shadow registers RT[31;24], 
20 depending upon the operational mode of the execution engine 14, , 

The lEU mode integer switch 34 is responsive to a mode 
control unit 44 in the execution engine 14. The mode control 
\init 44 provides pertinent state or mode information about the 
lEO 10 to the lEU mode integer switch 34 on line 46. When the 
25 execution engine performs a context switch such as a transfer 
to kernel mode, the mode control unit 44 controls the lEU mode 
integer switch 34 such that any requests to the second subset 
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RAI31:241 are re-directed to the shadow RT[ 3 1 : 24 ] , using the same 
requested offset within the integer set. Any operating- system 
kernel or other then-executing entity may thus have appajrent 
access to the second subset RAt31:24l without the 
otherwise-required overhead of swapping the contents of the 
second subset RA[31:241 out to main memory, or pushing the second 
subset RA[31:24] onto a stack, or other conventional 
register-saving technique. 

When the execution engine 14 returns to normal user mode and 
control passes to the originally-executing user application, the 
mode control unit 44 controls the lEU mode integer switch 34 such 
that access is again directed to the second subset RA[31:24l. 
In one embodiment, the mode control unit 44 is responsive to the 
present state of interrupt enablement in the lEU 10. In one 
embodiment, the execution engine 14 includes a processor status 
register (PSR) (not shown) , which includes a one-bit flag 
(PSR[71) indicating whether interrupts are enabled or disabled. 
Thus, the line 4 6 may simply couple the lEU mode integer switch 
34 to the interrupts -enabled flag in the PSR. While interrupts 
are disabled, the lEU 10 maintains access to the integers 
RAt23:0l, in order that it may readily perform analysis of 
various data of the user application. This may allow improved 
debugging, error reporting, or system performance analysis* 



B. Register Set FB 

The re-typable register set FB 20 may be thought of as 
including floating point registers 48 (RF[31:01); and/or integer 

WP2/RCC/SU0S/7988. 004 -Page/l 
Attornay Oocktt No.: SUOS7988/liCf/GBR/RCC 





-16- 

registers 50 (RB[31:0]), When neither data type is implied to 
the exclusion of the other, this application will use the term 
RFB(]. In one emiDodiment, the floating point registers RF(1 
occupy the same physical silicon space as the integer registers 
5 RB[], In one embodiment, the floating point registers RF[1 are 
sixty-four bits wide and the integer registers RB[1 are 
thirty-two bits wide. It will be understood that if 

double-precision floating point nximbers are not rec[uired, the 
register set RFB(1 may advantageously be constructed in a 
10 thirty-two-bit width to save the silicon area otherwise required 
3 by the extra thirty-two bits of each floating point register. 
m Each individual register in the register set RFB [ ] may hold 

either a floating point value or an integer value. The register 
y set RFB[] may include optional hardware for preventing accidental 

is access of a floating point value as though it were an integer 
5^ value, and vice versa. In one embodiment, however, in the 
O interest of simplifying the register set RFBtl, it is simply 

ffl left to the software designer to ensure that no erroneous usages 

of individual registers are made. Thus, the execution engine 14 
20 simply makes an access request on line 52, specifying an offset 
into the register set RFB[1, without specifying whether the 
register at the given offset is intended to be used as a floating 
point register or an integer register. Within the execution 
engine 14, various entities may use either the full sixty-four 
25 bits provided by the register set RFB[1, or may use only the low 
order thirty-two bits, such as in integer operations or 
single-precision floating point operations. 

i 
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A first register RFB(Ol 51 contains the constant value zero, 
in a form such that RB[01 is a thirty- two-bit integer zero 
(OOOOi^ex) ^ sixty-four-bit floating- point z 

(000000001^^3^). This provides the same advantages as described 
al30ve for RA[01. 



C. p«»g-tstftr Set C 

The register set C 22 includes a plurality of Boolean 
registers" 54 (RC(31:0l). RCtl is also known as the -condition 
status register- (CSR) . The Boolean registers RC[] are each 
identical in structure and addressing, albeit that each is 
individually addressable at a unique address or offset within 

RCI] . 

In one embodiment, register set C further includes a 
-previous condition status register- (PCSR) 60, and the register 
set C also includes a CSR selector unit 62, which is responsive 
to the mode control unit 44 to select alternatively between the 
CSR 54 and the PCSR 60. In the. one embodiment, the CSR is used 
when interrupts are enabled, and the PCSR is used when interrupts 
are disabled. The CSR and PCSR are identical in all other 
respects. In the one embodiment, when interrupts are set to bfe 
disabled, the CSR selector unit 62 pushes the contents of the CSR 
into the PCSR, overwriting the former contents of the PCSR, and 
when interrupts are re-enabled, the CSR selector unit 62 pops the 
contents of the PCSR back into the CSR. In other embodiments it 
may be desirable to merely alternate access between the CSR and 
the PCSR, as is done with RA(31:24l and RTl31:24l. In any event, 

* 
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the PCSR. is always available as a thirty-two-bit "special 
register" . 

* 

None of the Boolean registers is a dedicated condition flag, 
unlike the Boolean registers in previously known microprocessors. 
That is, the CSR 54 dpes not include a- dedicated carry flag, nor 
a dedicated a minus flag, nor a dedicated flag indicating 
equality of a comparison or a zero subtraction result. Rather, 
any Boolean register may be the destination of the Boolean result 
of any Boolean operation. As with the other register sets, a 
first Boolean register 58 (RC[Ol) always contains the value zero, 
to obtain the advantages explained above for-^RACO]. In the 
preferred embodiment, each Boolean register is one bit wide, 
indicating one Boolean value. 



15 II, EXECUTION ENGINE 

The execution engine 14 includes one or more integer 
functional units 66, one or more floating point functional units 
68, and one or more Boolean functional units 70. The functional 
units execute instructions as will be explained below. Buses 72, 
73, and 75 connect the various elements of the lEU 10, and will 
each be understood to represent data, address, and control paths. 



A. Instru ction Format 

Pig, 7 illustrates one exemplary format for an integer 
25 instruction which the execution engine 14 may execute. It will 
be understood that not all instructions need to adhere strictly 
to the illustrated format, and that the data processing system 
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includes an instruction f etcher- and decoder (not shown) which are 
adapted to operate upon varying format instructions. /The single 
example of Fig. 7 ±s for ease in explanation only, •* Throughout 
this Application the Identification I[l will be used to identify 
various bits of the instruction, 1(31:301 are reserved for 
future implementations of the execution engine 14. 1(29:261 
Identify the instruction class of the particular instruction. 
Table 1 shows the various classes of instructions performed by 
the present invention. 

~ '■ TABLE 1 

Instruction Classes 



Class 


Instructions 


0-3 


Integer and floating point 
register-to-register instructions 


4 


Immediate constant load 


5 


Reserved - 


6 


Load 


7 


Store 


8-11 


Control Flow 


12 


Modifier 


13 


Boolean operations 


14 


Reserved 


15 


Atomic (extended) 



Instruction classes of particular interest to this 
Application include the Class 0-3 register-to-register 
instructions and the Class 13 Boolean operations. While other 
classes of instructions also operate upon the register file 12, 
further discussion of those classes is not believed necessary in 
order to fully understand the present invention. 

1(251 is identified as BO, and indicates whether the 
destination register is in register set A or register set B. 
1(24:221 are an opcode which identifies, within the given 
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instruction class, which specific function is to be performed. 
For example, within the register-to-register classes,' an opcode 
may specify -addition". I [21] identifies the addressing mode 
which, is to be used when performing the instruction either 
register source addressing or immediate source addressing. 
1 120 Tie 1 identify the destination register as an offset within 
the register set indicated by BO . 11151 is identified as Bl and 
indicates whether the first operand is to be taken from register 
set A or register set B. I [14: 10] identify the register offset 
from which the first operand is to be taken. I [9: 8] identify a 
function selection — an extension of the opcode 1(24:221. 
H7:61 are reserved. 1(51 is identified as B2 and indicates 
whether a second operand of the instruction is to be taken from 
register set A or register set B. Finally, 1(4:01 identify the 
register offset from which the second operand is to be taken. 

With reference to Fig. 1, the integer functional unit 66 and 
floating point functional unit 68 are equipped to perform integer 
comparison instructions and floating point comparisons, 
respectively. The instruction format for the comparison 
instruction is siibstantially identical to that shown in Fig. 7, 
with the caveat that various fields may advantageously be 
identified by slightly different names. 1(20:161 identifies the 
destination register where the result is to be stored, but the 
addressing mode field I [211 does not select between register sets 
A or B. Rather, the addressing mode field indicates whether the 
second source of the comparison is found in a register or is 
immediate data. Because the comparison is a Boolean type 
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instruction, the destination register is always found in register 
set C. All other fields function as shown in Fd,g. 7. In 
performing Boolean operations within the integer and floating 
point functional units, the opcode and function select fields 

which Boolean condition is to be tested for in comparing 
tbe two operands. The'integer and the floating point functional 
\anits fully support the IEEE standards for numerical comparisons. 

The lEU 10 is a load/store machine. This means that when 
the contents of a register are stored to memory or read from. 
10 memory, an address calculation must be performed in order to 
determine which location in memory is to be the source or the 
destination of the store or load, respectively. When this is 
the case, the destination register field 1120:161 identifies the 
register which is the destination or the source of the load or 
15 store, respectively. The source register 1 field, 1114:101, 

if ies a register in either set A or B which contains a base 
address of the memory location.. In one embodiment, the source 
register- 2 field, 114:0], identifies a register in set X or set 
B which contains an index or an offset from the base. The 
20 load/store address is calculated by adding the index to the base. 

In another mode, 117:0] include immediate data which are to be 
added as an index to the base. 



B. op^t-atiop nf the Tnsf -ruction — Pxf?gvta.c>n — Qult — and 

Regist er Sets 

It will be understood by those skilled in the art that the 
nteger functional unit 66, the floating point functional unit 
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68, and the Boolean functional unit 70 are responsive to the 
contents of the instruction class field, the opcode^* fie^ld, and 
the function select field of a present - instruction being 
executed. 

1 • Tnteaer Operations 
For example, when the instruction class, the opcode, and 
function select indicate that an integer register-to-register 
addition is to be performed, the integer functional unit may be 
responsive thereto to perform the indicated operation, while the 
floating point functional unit and the Boolean functional unit 
may be responsive thereto to not perform the operation. As will 
be understood from the cross-referenced applications, however, 
the floating point functional unit 68 is equipped to perform both 
floating point and integer operations. Also, the functional 
units are constructed to each perform more than one instruction 
simultaneously. 

The integer functional unit 66 performs integer functions 
only. Integer operations typically involve a first source, a 
second source, and a destination. A given integer instruction 
will specify a particular operation to be performed on one or 
more source operands and will specify that the result of the 
integer operation is to be stored at a given destination. In 
some instructions, such as address calculations employed in 
load/store operations, the sources are utilized as a base and 
an index. The integer functional unit 66 is coupled to a first 
bus 72 over which the integer fxinctional unit 66 is connected to 
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a switching and multiplexing control (SMC) unit A 74 and an SMC 
unit B 76, Each integer ins truction executed by the integer 
functional unit 66 will specify whether each of its* sources and 
destination reside in register set A or register set B. 
5 Suppose that the lEU 10 has received, from the instruction 

fetch unit (not shown), an instruction to perform an integer 
register-to-register addition. In various embodiments, the 
instruction may specify a register bank, perhaps even a separate 
bank for each source and destination. In one embodiment, the 

10 instruction I[l is limited to a thirty-two-bit length, and does 

"~t 
t=? 

g not contain any indication of which register bank 16-0 through 

16-n is involved in the instruction. Rather, the bank selector 
y unit 78 controls which register bank is presently active. In 

^ one embodiment, the bank selector unit 78 is responsive to one 

15 or more bank selection bits in a status word (not shown) within 
=^ the lEU 10. 

□ In order to perform the integer addition instruction, the 

0 integer functional unit 66 is responsive to the identification 

in I[14:10] and I [4:0] of the first and second source registers, 

20 The integer functional unit 66 places an identification of the 
first and second source registers at ports SI and 82, 
respectively, onto the integer functional unit bus 72 which is 
coupled to both SMC units A and B 74 and 76. In one embodiment, 
the SMC units A and B are each coupled to receive BO -2 from the 

25 instruction I(] . In one embodiment, a zero in any respective Bn 
indicates register set A, and a one indicates register set B. 
During load/store operations, the source ports of the integer 
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and floating point functional units 66 and 68 are utilized as a 
base port and an index port/ B and I, respectively, 

* 

After obtaining the first and second operands from the 
indicated register sets on the bus 72, as explained below, the 
integer fxinctional unit 66 performs the indicated operation upon 
those operands / and provides the result at port D onto the 
integer functional unit bus 72, The SMC units A and B are 
responsive to BO to route the result to the appropriate register 
set A or B. 

The SMC unit B is further responsive to the instruction 
class; opcode, and function selection to control whether operands 
are read from (or results are stored to) either a floating point 
register RF[] or an integer register BB[]. As indicated, in one 
embodiment, the registers RF[1 may be sixty-four bits wide while 
the registers are RB[1 are only thirty-two bits wide. Thus, SMC 
unit B controls whether a word or a double word is written to the 
register set RFBtJ, Because all registers within register set 
A are thirty- two bits wide, SMC unit A need not include means for 
controlling the width of data transfer on the bus 42, 

All data on the bus 42 are thirty-two bits wide, but other 
sorts of complexities exist within register set A. The lEU mode 
integer switch 34 is responsive to the mode control unit 44 of 
the execution engine 14 to control whether data on the bus 42 are 
connected through to bus 36, bus 38 or bus 40, and vice versa, 

lEU mode integer switch 34 is further responsive to 
I [20:16], I[14:10l, and 1(4:0]. If a given indicated destination 
or source is in RA[23:0l, the lED mode integer switch 34 

* ^^^^^^^^^^ 
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au1:omat:lcally couples 1:he clat:a between lines 42 and 36. However, 
for registers RA(31:24], the lEU mode integer ,'Swi'tch 34 

/ 

m 

determines whether data on line 42 is connected to' line 38 or 
line 40, and vice versa. When interrupts are enabled, lEtJ mode 
integer switch 34 connects the SMC unit A to the second subset 
28 o£ integer registers RA[31:24] . When interrupts are disabled, 
the lEU mode integer switch 34 connects the SMC unit A to the 
shadow registers RT[31:24l. Thus, an instruction executing 
within the integer functional unit 66 need not be concerned with 
whether to address RA[31:241 or RT[31:24l. It will be understood 
that SMC unit A may advantageously operate identically whether 
it is being accessed by the integer functional unit 66 or by the 
floating point functional unit 68 • 



2 . Floating Point Operations 
The floating point functional unit 68 is responsive to the 
class, opcode, and function select fields of the instruction, to 
perform floating point operations. The SI, S2, and D, ports 
operate as described for the integer functional unit 66. SMC 
unit B is responsive to retrieve floating point operands from, 
and to write numerical floating point results to, the floating 
point registers RF(] on bus 52. 



3 . Boolean Operations 
SMC unit C 80 is responsive to the instruction class, 
opcode, and function select fields of the instruction 1(1 • When 
SMC unit C detects that a comparison operation has been performed 
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by one of the numerical functional units 66 or 68, it writes the 
Boolean result over bus 56 to the Boolean register indicated at 
the D port of the functional \mit which performed the comparison. 

The Boolean functional unit 70 does not perform comparison 
instructions as do the integer and floating point functional 
units 66 and 68~ Rather, the Boolean functional unit 70 is only 
used in performing bitwise logical combination of Boolean 
register contents, according to the Boolean functions listed in 




Table 2. 




30 



TJ-?.^ .-gS .9.81 
0000 
0001 
0010 
0011 
0100 
0101 
0110 

0111 
1000 
1001 
1010 
1011 
1100 
1101 
1110 

nil 



TABLE 2 
Boolean Functions 

Boolean -result calculation 
ZERO 

SI AND S2 

SI AND (NOT S2) 

SI 

(NOT SI) AND S2 
S2 

SI XOR S2 
SI OS. S2 
SI NOR S2 
SI XNOR S2 
NOT S2 

SI OR (NOT S2) 
NOT SI 

(NOT SI) OR 32 
SI NAND S2 
QUE 



The advantage which the present invention obtains by having 
a plurality of homogenous Boolean registers, each of which is 
individually addressable as the destination of a Boolean 
operation, will be explained with reference to Tables 3-5. Table 
3 illustrates an example of a segment of code which performs a 
conditional branch based upon a complex Boolean function. The 
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complex Boolean function includes three portions which are OR-ed 
together. The first portion includes two sub-port^-ons , which 
are AND- ed together. 



TABLE 3 

Example of Complex Boolean Function 

1 RA [11 • = 0 / 

2 IF (((RA(2] = RAI31) AND (RA(4] > RA(51)) OR 
•3 (RAtel < RAt7l) OR 

4 (RAI8] <> RAt9l)) THEN 

5 X() 

6 ELSE 

7 Y(); 

8 RA(10l := 1; 



Table 4 illustrates, in pseudo-assembly form, one likely 
method by which previous microprocessors would perform the 
function of Table 3. The code in Table 4 is written as though 
it were constructed by a compiler of at least normal intelligence 
operating upon the code of- Table 3. That is, the compiler will 
recognize that the condition expressed in lines 2-4 of Table 3 
is passed if any of the three portions is true. 
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TABLE 4 

Execut:lon of Complex Boolean Function 
Wi-thout Boolean Register Set 



1 


START 


LDI 


RA{ 1] ^0 


2 


TESTl 


CMP 


RA(2l ,RA[31 


3 




BNE 


TEST2 


4 




CMP 


RA[4l,RA[5l 


5 




BGT 


DO IF 


6 


TEST2 


CMP 


RAI6] ^RA[71 


7 




BLT 


DO IF 


6 


TEST3 


CMP 


RA(81 ^RA[91 


9 




BEQ 


DO ELSE 


10 


DO_TF 


JSR 


ADDRESS OF X{) 


11 




JMP 


PAST ELSE 


12 


DO ELSE 


JSR 


ADDRESS OF Y{) 


13 


PAST ELSE 


LDI 


RAllO] , 1 



The assignment at line 1 of Table 3 is perfoirmed by the 
«*loaa immediate" statement at line 1 of Table 4. The first 
portion of the complex Boolean condition, expressed at line 2 
of Table 3, is represented by the statements in lines 2-5 of 
Table 4. To test whether RA(2l equals RAt3l, the compare 
statement at line 2 of Table 4 performs a subtraction of RA[2l 
from RA[3l or vice versa, depending upon the implementation, and 
may or may not store the result of that subtraction,. The 
important function performed by the comparison statement is that 
the zero, minus, and carry flags will be appropriately set or 
cleared. 

The conditional branch statement at line 3 of Table 4 
branches to a subsequent portion of code upon the condition that 
RAt2] did not equal RA(31. If the two were unequal, the zero 
flag will be clear, and there is no need to perform the second 
sub-portion. The existence of the conditional branch statement 
at line 3 of Table 4 prevents the further fetching, decoding, and 
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executing of any subsequent statement in Table 4 until the 
results of the comparison in line 2 are known, causing/a pipeline 
stall. If the first sub-portion of the first portion (TESTl) is 
passed, the second sub-portion at line 4 of Table 4 then compares 
RA[4] to RA[51, again setting and clearing the appropriate status 
flags . 

If RAI21 equals RA[3l, and RA[4l is greater than RA[51, 
there is no need to test the remaining two portions (TEST2 and 
TEST3) in the complex Boolean function,, and the statement at 
Table 4, line 5, will conditionally branch to the label DO_IF, 
to perform the operation inside the "IF" of Table 3. However, 
if the first portion of the test is failed, additional processing 
is required to determine which of the "IF" and "ELSE" portions 
should be executed. 

The second portion of the Boolean function is the comparison 
of RA[61 to R2l[7], at line 6 of Table 4, which again sets and 
clears the appropriate status flags- If the condition "less 
than" is indicated by the status flags, the complex Boolean 
function is passed, and execution may immediately branch to the 
Dp__IF label. In various prior microprocessors, the "less than" 
condition may be tested by examining the minus flag. If RA(7l 
was not less than RA(6l, the third portion of the test must be 
performed. The statement at line 8 of Table 4 compares RA[8] to 
RA[9l. If this comparison is failed, the "ELSE" code should be 
executed; otheirwise, execution may simply fall through to the 
"IF" code at line 10 of Table 4, which is followed by an 
additional jxmp around the "ELSE" code. Each of the conditional 
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branches in Table 4, at lines 3^ 5, 7 and 9, results in a 
separate pipeline stall, significantly increasing the/processing 
time req[uired for handling this complex Boolean function. 

The greatly improved throughput which results from employing 
the Boolean register set C of the present invention will now 
readily be seen with specific reference to Table 5. 



TABLE 5 

Execution of Complex Boolean Function 
With Boolean Register Set 



1 


START 


LDI 


RA[ 1] ,0 


2 


TESTl 


CMP 


RC[lll ,RA[2] ,RA[3] ^EQ 


3 




CMP 


RC[12l ^RA[4] ,RA[5] ,GT 


4 


TEST2 


CMP 


RC[131 /RA(61 /RA(7] ,LT 


.5 


TESTS . 


CMP 


RC[14] ,RA[8] ,RA(9] ,NE 


6 


COMPLEX 


AND 


RC[15] ,RCt 11] ,RC[ 12] 


7 




OR 


RC[161 ^RC[13] ,RC[141 


8 




OR 


RC[17] ,RC(151 ^RCt 16] 


9 




BC 


RC[171 ,DO ELSE 


10 


DO_IF 


JSR 


ADDRESS OF X() 


11 




JMP 


PAST ELSE 


12 


DO ELSE 


JSR 


ADDRESS OF Y() 


13 


PAST_ELSE 


LDI 


RA[101,1 



Most notably seen at lines 2-5 of Table 5^ the Boolean 
register set C allows the microprocessor to perform the three 
test portions back- to-back without intervening branching. Each 
Boolean comparison specifies two operands, a destination, and a 
Boolean condition for which to test. For example, the comparison 
at line 2 of Table 5 compares the contents of RA12] to the 
contents of RA[3], tests them for eq[uality, and stores into 
RC[11] the Boolean value of the result of the comparison. Note 
that each comparison of th.e Boolean function stores its 
respective intermediate results in a separate Boolean register. 
As will be understood with reference to the above-referenced 
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xelated applications, the lEO 10 is capable of simultaneously 
performing more than one of the comparisons . • 

* 

After at least the first two comparisons at lines 2-3 of 
Table 5 have been completed, the two respective comparison 
results are AND-ed together as shown at line 6 of Table 3. 
"RC[15l then holds the result of the first portion of the test. 
Tbe results of the second and th.ird sub-portions of the Boolean 
function are OR-ed together as seen in Table 5, line 7. It will 
be understood that, because tbere are no data dependencies 
involved, the AND at line 6 and the OR-ed in line 7 may be 
performed in parallel. Finally, the results of those two 
operations are OR-ed together as seen at line 8 of Table 5. It 
will be understood that register RC[17l will then contain a 
Boolean value indicating the truth or falsity of the entire 
complex Boolean function of Table 3. It is then possible to 
perform a single conditional branch, shown at line 9 of Table 5. 
In the mode shown in Table 5, tbe method branches to the "ELSE" 
code if Boolean register RC[17l is clear, indicating that the 
complex function was failed. The remainder of the code may be 
tbe same as it was without the Boolean register set as -seen in 
Table 4* 

The Boolean functional unit 70 is responsive to the 
instruction class, opcode, and function select fields as are the 
other functional units- Tbus, it will be understood with 
reference to Table 5 again, tbat the integer and/or floating 
point functional units will perform the instructions in lines 1-5 
and 13, and the Boolean functional unit 70 will perform the 
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Boolean bitwise combination instructions in lines 6-8. The 
control flow and branching instructions in line 9-^2 will be 

■ 

performed by elements of the lEU 10 which/ are not shovm in 
Fig. 1. 

III. DATA PATHS 

Figs. 2-5 illustrate further details of the data paths 
within the floating pointy integer, and Boolean portions of the 
lEU , respectively . 

A. Floating Point Portion Data Paths 

As seen in Fig. 2, the register set FB 20 is a multi-ported 
register set. In one embodiment, the register set FB 20 has two 
write ports WFBO-1, .and five read ports RDFBO-4. The ■ floating 
point functional unit 68 of Fig. 1 is comprised of the ALU2 102, 
FALU 104, MULT 106, and NQLL 108 of Fig. 2. All elements of Fig. 
2 except the register set 20 and the elements 102-108 comprise 
the SMC unit B of Fig. 1. . 

External, bidirectional data bus EX_DATA(1 provides data to 
the floating point load/store unit 122. Immediate floating point 
data bus LDF^IMEDt] provides data from a "load immediate" 
instruction. Other immediate floating point data are provided 
on busses RFF1__IMED and RFF2_IMED, such as is involved in an "add 
immediate" instruction. Data are also provided on bus 

EX__SR_DT[] , in response to a "special register move" instruction. 
Data may also arrive from the integer portion, shown in Fig. 3, 
on busses 114 and 120. 
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The floating point register set's two write ports WFBO and 
WFBl are coupled to write multiplexers 110-0 and 110-1 
respectively. The write multiplexers 110 receive data from: the 
ALUO or SHFO of the integer portion of Fig. 3; the FALU; the 
MULT; the ALU2; either EX_SR_DT[ ] or LDF_IMED[1; and 
EXJDATA[] . Those skilled in the art will understand that control 
signals (not shown) determine which input is selected at each 
po^t^ and address signals (not shown) determine to which register 
the input data are written. Multiplexer control and register 
addressing are within the skill of persons in the art, and will 
not be discussed for any multiplexer or register set in the 
present invention. 

The floating point register set's five read ports RDFBO to 
RDFB4 are coupled to read multiplexers 112-0 to 112-4, 
respectively. The read multiplexers each also receives data 
from: either EX_SR_DT[] or LDF_IMED(1, on load immediate bypass 
bus 126; a load external data bypass bus 127, which allows 
external load data to skip the register set FB; the output of 
the ALU2 102, which performs -non-multiplication - integer 
operations; the FALU 104, which performs non-multiplication 
floating point operations; the MULT 106, which performs 
multiplication operations; and either the ALUO 140 or the SHFO 
144 of the integer portion shown in Fig. 3, which respectively 
perform non-multiplication integer operations and shift 
operations. Read multiplexers 112-1 and 112-3 also receive data 
from RFF1_IMED(1 and RFF2_IMED[], respectively. 
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Each arithmetic-type unit 102-106 in the floating point 
portion receives two inputs, from respective sets of first and 
second source multiplexers SI and S2, The first source of each 
unit FALU, and MULT comes from the output of either read 

multiplexer 112-0 or 112-2, and the second source comes from the 
output of either read multiplexer 112-1 or 112-3. The sources 
of the FALU and the MULT may also come from the integer portion 
of Fig. 3 on bus 114. 

The results of the ALU2, FALU, and MULT are provided back 
to the write multiplexers 110 for storage into the floating point 
registers RFI], and also to the read multiplexers 1J2 for re-use 
as operands of subsequent operations. The FALU also outputs a 
signal FALU_BD indicating the Boolean result of a floating point 
comparison operation. FALU_BD is calculated directly from 
internal zero and sign flags of the FALU. 

Null byte tester NULL 108 performs null byte testing 
operations upon an operand from a first source multiplexer, in 
one mode that of the ALU2. NULL 108 outputs a Boolean signal 
NULLB_BD indicating whether- the thirty-two-bit first source 
operand includes a byte of value zero. 

The outputs of read multiplexers 112-0, 112-1, and 112-4 are 
provided to the integer portion (of Fig. 3) on bus 118. The 
output of read multiplexer 112-4 is also provided as STDT_FP [ 1 
store data to the floating point load/store unit 122. 

Fig. 5 illustrates further details of the control of the 
SI and S2 multiplexers. As seen, in one embodiment, each SI 
multiplexer may be responsive to bit Bl of the instruction 
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and each S2 multiplexer may be responsive to bit B2 of the 
instruction ![]• The SI and S2 multiplexers select the sources 
for the various functional units. The sources may come from 
either of the register files ^ as controlled by the Bl and B2 bits 
of the instruction itself. Additionally, each register file 
includes two read ports from which the sources may come, as 
controlled by hardware not shown in the Figs . 

B. Integer Portion Data Paths 

As seen in Fig. 3, the register set A 18 is also 
multi-ported. In one embodiment/ the register set A 18 has two 
write ports WAO-1, and five read ports RDAO-4. The integer 
functional unit 66 of Fig. 1 is comprised of the ALUO 140, ALUl 
142, SHFO 144, and NULL 146 of Fig. 3. All elements of Fig. 3 
except the register set 18 and the elements 140-146 comprise the 
SMC unit A of Fig. 1. 

External data bus EX__DATAt ] provides data to the integer 
load/store unit 152. Immediate integer data on bus LDI_IMED[] 
are provided in response to a ""load immediate" instruction. 
Other immediate integer data are provided on busses RFA1_IMED and 
RFA2_IMED in response to non-load immediate instructions, such 
as an "add immediate*. Data are also provided on bus EX_SR_DT[] 
in response to a ••special register move" instruction. Data may 
also arrive from the floating point portion (shown in Fig. 2) 
on busses 116 and 118. 



coupled to write multiplexers 148-0 and 148-1, respectively. The 



The integer register set's two write ports WAO and WAl are 
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wirltie xnult:lplexers 148 receive data from: the FALU or MOLT of 
the floating point portion (of Fig- 2); the ALDO; ; the ALUl; 
the SHFO; either EX_SR_DT[ 1 or LDI_IMED(1;- and Ex£dATA[ ] . 

The integer register set's five read ports RDAO to RDA4 are 
coupled to read multiplexers 150-0 to 150-4, respectively. Each 
read multiplexer also receives data from: either EX_SR_DT[] or 
LDI_IMEDtl on load immediate bypass bus 160; a load external 
data bypass bus 154 > which allows external load data to skip the 
register set A; ALUO; ALUl; SHFO; and either the FALU or the 
MULT of the floating point portion (of Fig. 2). Read 
multiplexers 150-1 and 150-3 also receive data from RFA1_IMED(1 
and RFA2_IMED[1, respectively. 

Each arithmetic -type unit 140-144 in the integer portion 
receives two inputs, from respective sets of first and second 
source multiplexers SI and S2* The first source of ALUO comes 
from either the output of read multiplexer 150-2, or a 
thirty-two-bit wide constant zero (OOOOj^^^^) , or floating point 
read multiplexer 112-4. The second source of ALUO comes from 
either read multiplexer 150^3 or floating point read multiplexer 
112-1. The first source of ALUl comes from either read 
multiplexer 150-0 or IF_PC[1, IF_PC[] is used in calculating a 
return address needed by the instruction fetch unit (not shown) , 
due to the lEU's ability to perform instructions in an 
out-of-order sequence. The second source of ALUl comes from 
either read multiplexer 150-1 or CF_pFFSET[ 1 . CF_OFFSET(l is 
used in calculating a return address for a CALL instruction, also 
due to the out-of-order capability. 
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The first source of the shifter SHFO 144 is from either: 
floating point read multiplexer 112-0 or 112-4; or any "integer 

T 

read multiplexer 150. The second source of SHFO is from either: 
floating point read multiplexer 112-0 or 112-4; or integer read 
multiplexer -150-0^ 150-2^ or 150-4. SHFO takes a third input 
'from a shift amount multiplexer (SA) . The third input controls 
how far to shifty and is taken by the SA multiplexer from either: 
floating point read multiplexer 112-1; integer read multiplexer 
150-1 or 150-3; or a five-bit wide constant thirty-one (IIIII2 
or BIj^q), The shifter SHFO requires a fourth input from the size 
multiplexer (S) . The fourth input controls how much data to 
shift, and is taken by the S multiplexer from either: read 
multiplexer 150-1; read multiplexer 150-3; or a five-bit wide 
constant sixteen (IOOOO2 or 16^q), 

The results of the ALUO, ALUl, and SHFO are provided back 
to the write multiplexers 148 for storage into the integer 
registers RA(], and also to the read multiplexers 150 for re-use 
as operands of subsecjuent operations. The output of either ALDO 
or SHFO is provided on„bus-120 to the -floating point portion of 
Fig. 3. The ALUO and ALUl also output signals Ar*UO_BD and 
AI*U1_BD, respectively, indicating the Boolean results of integer 
comparison operations, ALUO_BD and ALU1_BD are calculated 
directly from the zero and sign flags of the respective 
functional units. ALUO also outputs signals EX_TADR( ] and 
EX_yM_ADR. EXJTADR[ ] is the target address generated for an 
absolute branch instruction, and is sent to the IFU (not shown) 
for fetching the target instruction. EX_VM_ADR[ ] is the virtual 
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address used for all loads from memory and stores to memory, and 
is sent to the VMU (not shown) for address translation. 

Null byte tester NULL 146 perfojMis - null byte testing 
operations upon an operand from a first source multiplexer. In 
one embodiment, the operand is from the ALUO. NULL 146 outputs 
a Boolean signal SULLA_BD indicating whether the thirty- two-bit 
first source operand includes a byte of value zero. 

The outputs of read multiplexers 150-0 and 150-1 are 
provided to the floating point portion (of Fig. 2) on bus 114. 
The output of read multiplexer 150-4 is also provided as 
STDT_INT[] store data to the integer load/store unit 152. 

A control bit PSR(7] is provided to the register set A 18. 
It is this signal which, in Fig. 1, is provided from the mode 
control unit 44 to the lED mode integer switch 34 on-line 46. 
The lEO mode integer switch is internal to the register set A 18 
as shown in Fig. 3. 

Fig. 6 illustrates further details of the control of the SI 
and S2 multiplexers. The signal ALUO_BD 

C. Boolean Portion Dat-a Pal-hg 

As seen in Fig. 4, the register set C 22 is also 
multi -ported. In one embodiment, the register set C 22 has two 
write ports WCO-1, and five read ports RDAO-4.^ All elements of 
Fig. 4 except the register set 22 and the Boolean combinational 
unit 70 comprise the SMC unit C of Fig. 1. 

The Boolean register set's two write ports WCO and WCl are 
coupled to write multiplexers 170-0 and 170-1, respectively. The 



write multiplexers 170 receive data from: the output of the 
Boolean combinational unit 70, indicating the Boolean result of 

* 

a Boolean combinational operation; ALUO_BD from the integer 
portion of Fig, 3/ indicating the Boolean result of an integer 
comparison; FALU_BD from the floating point portion of Fig. 2, 
indicating the Boolean result of a floating point comparison; 
either AIiDl_BD_P from ALUl, indicating the results of a compare 
instruction in ALOl, or NULLA_BD from NULL 146 , indicating a null 
byte in the integer portion; and either AL02__BD_P from ALU2, 
indicating the results of a compare operation in ALU2 , or 
NULLB_BD from NULL 108, indicating a null byte in the floating 
point portion. In one mode, the ALUO__BD, ALUl^BD^ ALU2_BD, and 
FALU^BD signals are not taken from the data paths, but are 
calculated as a function of the zero flag, minus flag, carry 
flag, and other condition flags in the PSR. In one mode, wherein 
up to eight instructions may be executing at one instant in the 
lEU, the lEU maintains up to eight PSRs. 

The Boolean register set C is also coupled to bus 
EX_SR_DT[], for use with "special register move" instructions. 
The CSR may be written or read as a whole, as though it were a 
single thirty-two-bit register. This enables rapid saving and 
restoration of machine state information, such as may be 
necessary upon certain drastic system errors or upon certain 

* 

forms of grand scale context switching. 

The Boolean register set's five read ports RDCO to RDC3 are 
coupled to read multiplexers 172-0 to 172-4, respectively. The 
read multiplexers 172 receive the same set of inputs as the write 
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multiplexers 170 receive. The Boolean combinational unit 70 
receives inputs from read multiplexers 170-0 and 170-1. Read 

* 
« 

multiplexers 172-2 and 172-3 respectively provide signals 
BLBP_CPORT and BLBP_DPORT. BLBP_CPORT is used as the basis for 
conditional branching instructions in the lEU. BLBP_DPORT is 
used in the "add with Boolean" instruction, which sets an integer 
register in the A or B set to zero or one (with leading zeroes), 
depending upon the content of a register in the C set. Read port 
RDC4 is presently unused, and is reserved for future enhancements 
of the Boolean functionality of the lEU. 

IV. CONCLUSION 

While the features and advantages of the present invention 
have been described with respect to particular embodiments 
thereof, and in varying degrees of detail, it will be appreciated 
that the invention is not limited to the described embodiments. 
The following Claims define the invention to be afforded patent 
coverage . 
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